newdocms: Beyond the Hierarchical File System 742
Manuel Arriaga writes "After two years of hard work (and many scrapped versions), I have just released a (ugly, but working!) preview version of newdocms, a completely new document management system. newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user, which provides a radically new way to store and retrieve documents. No longer will you browse complex directory trees or directly interact with the HFS; instead, you define any number of document attributes when saving a document and then query a database of those attributes when trying to retrieve it later on.
For the first time you have a true alternative to the hierarchical file system at the OS level. Through the modification of the KDE shared libraries, newdocms currently works with all KDE apps! (I am looking for volunteers to add support for GNOME and OpenOffice.org!) This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
Pr0n FS (Score:4, Funny)
D
I already use a different one: (Score:5, Interesting)
Re:I already use a different one: (Score:4, Insightful)
This is completely untrue. There are lots of other options (like The Brain) that have been out for a while that have nothing to do with "free software". Hell, the fact that other proprietary systems (that are better, in my opinion) came out earlier shows that not only is "free software" irrelevant in this discussion, but it actually lags behind software driven by the profit model.
Re:I already use a different one: (Score:2, Interesting)
How is The Brain able to dynamically relink all your existing applications to deal with this transparently, when you File->Open inside of them?
(Assume it uses some crazy undocumented Windows trick) How are we to resolve incompatibilities between The Brain and a software program that has already messed with the Windows file open interface in its own way? Pray, and wait for the two developers to sign mounds of NDAs seems like the only option. And even then, there's no guarantee it's going to get addressed.
Re:I already use a different one: (Score:3, Interesting)
How about instead we assume it uses the well documented Pluggable Asynchronous File System Driver API? So it works with all your existing Win32 applications transparently in a very normal way. Your post is pure FUD.
Not quite the same thing (Score:2, Insightful)
I believe the point that this mad scientist was making was that he's completely replaced the FS with this new database-based one.
It's certainly not innovative, but it's something different I guess.
Re:Not quite the same thing (Score:2)
newdocms isn't a file browser: it is a layer between the hierarchical file system (HFS) and the user. It didn't explain it very well on the web page, but I'd assumed that they both did the same thing... give a different interface between the user and the existing file system.
Re:I already use a different one: (Score:2)
Interesting... (Score:5, Insightful)
Anyway, an interesting concept.
Re:Interesting...But Why? (Score:3, Insightful)
Exactly. Users STILL have to create their own type of organization.
/documents contains documents. Duh.
/documents/work contains documents for work.
The problem is people don't want to be organized, so they look to technology to help them be lazy. Plus try explaining 'metadata' to someone. At least now you can use the file cabinet, drawers, folders, papers example to explain the layout to someone.
agree (Score:4, Insightful)
Re:Interesting... (Score:5, Interesting)
That's the whole reason for the program -- you shouldn't have to remember long, detailed folder structures and filenames in order to retrieve a file you were looking for.
I can't tell you how many times I've had to help users find some file, shortcut, document or spreadsheet that they've "lost" because they forgot the correct path. But they do remember it involved a loan, or it involved a party announcement, or something similar. I swear, just the other day I spent an hour waiting on another employee to get off the phone so I could find a folder shortcut another employee had lost. She wasn't sure what folder the shortcut referred to, but she knew it contained documents of a certain type.
Do you see a pattern here? To me, this sounds just like what Microsoft is trying to do with Longhorn, and potentially Office 11. People are tired of searching and hunting through folders and heirarchies full of oddly named files and temp folders that can confuse Joe User.
This is awesome software and definitely a step forward. It might not change the geek community, but it will certainly help out system admins of the world. While your method still works (and hopefully, in the future, these two systems should work hand-in-hand, but that's another project I suppose), this is a damn fine alternative.
Re:Interesting... (Score:5, Insightful)
When it comes to that, users just need full text indexing of their documents so they can do full text searches more quickly. Iduno about windows, but we've definitely got that in mac os.
Re:Interesting... (Score:5, Insightful)
Great for writers, not so good for graphic artists. I sysadmined for a few years in a graphics/video shop that had tens of thousands of images on the various fileservers. I essentially wrote a very simple version of this "DB on top of FS" idea because I was tired of helping people find their TIFFs.
Yes, /home/projects/DOJ/annual_report/masters is just one piece of metadata, and some people find that easier to remember than several keywords. OTOH, suppose two years later you want to reuse that image of the hispanic male using a computer. Was that in /home/projects/DOJ/annual_report/masters or /home/projects/USDA/website/images ?
My solution (and, it would seem, the article's, though I'm sure that one is a lot more robust), was to keep the users away from the FS completely. Just let them bring up all the images tagged with "hispanic male computer." Most graphics shops I've seen either built a DB file manager or bought one.
Honestly, I think the idea of computers holding a lot of "files" organized into "directories" is a little old. It was great in 1970 but maybe (like this guy is doing) we should rethink it a little. Why not say a computer has certain knowledge ("files") and certain capabilities ("executables")? Rather than naming files, describe the data you want the computer to retain, and retreive it later from that description.
As somebody pointed out, Office2K/XP and W2K/XP have something like this already, but people don't use it because they still have to name files. That's the crucial step, I think, and that's why I took that power out of my users' hands. They never named files; the app did it for them. Instead, they described files and versions. Abstraction and all that...
Anyways, this idea may not help everybody, but it sounds like my old users would have liked it (they, btw, were very good about using specific and accurate keywords... no QWERTY effect here; they just didn't think in terms of files and directories). Plus, it's nice to see somebody trying to move past the "files and directories" mindset we've had for the past 3 decades.
Re:Interesting... (Score:3, Informative)
Not sure it's any better... (Score:5, Interesting)
Everyone seems hot to SQL the file system, and while I think that will be the way of the future, I don't think that there is a clear view of how that works from the user's perspective yet. Remember that this is a rather large paradigm shift from what everyone is used to. It's going to take a while for this to mature to the point that Joe User is going to be able to hack it. I mean, I looked at the Save As dialog on that page, and while it looks cool it also looks counter-intuitive to me and I'm a developer! How much more will a user get confused?
All in all we're going in the right direction, but by no means are we anywhere near the goal yet.
Ben
Re:Interesting... (Score:2)
I agree. I'm skeptical of all these UI ideas which start with: "Poor User, he's too stupid to remember filenames, or create a hierarchy that classifies files properly."
I don't know how many times I've tried to re-find something on the web with Google, but just can't come up with the right search terms to bring it up. That's what happens to me when I think that I won't need the URL, I'll just remember some keywords...
Plus, the filesystems I have trouble with won't be helped by this. At work, how will a BA tell me where he's stored the requirements on the shared filesystem? I suppose he crafts a query which returns just one single document. But then how's that easier than a filename?
I just don't get it. In any case, even if it does catch on with joe sixpack, it won't with me.
Coming soon from M$? (Score:2, Interesting)
Oh well, in a few years the *n?x-philes will be screaming about M$ stealing their ideas. Figures.
Came ages ago from Big Blue? (Score:2)
I find it a nuicance from a programmer's point of view, and indeed it can get quite messy when you have about 100 different Libraries on your 400, each with a few dozen Objects, some of them with another few hundred members etc etc. From the point of view of the application, and the end user (who will typically have only a single version of a few applications installed on his dedicated database server), it is the greatest thing since sliced silicon.
I think it predates the hierarchical filesystem by a lifetime as well (again in Moore's law years).
Remind anyone of something? (Score:5, Interesting)
LIAR! (Score:5, Funny)
Re:Remind anyone of something? (Score:3, Insightful)
Standard Slashdot Clue-Slap #4: The Fallacy of Mass Hypocracy
If you walked into PNC Park during a game, and saw a group of 10 people wearing Braves jerseys, would you call the remaining 38,000+ Pirate fans* in the crowd hypocrites? What about a vegetarian eating a salad at a steakhouse?
What you're observing is not hypocracy on the posters' part. They're willing to join the debate, and they deserve credit for that. (You imply that much with your preemptive taunt to anyone who would mod you down.) It's just human nature getting the best of the moderation system. It's too easy to silently and anonymously squelch a valid dissenting opinion. And while meta-moderation can cull out the egos and zealots, it operates too slowly to keep up with short tempers.
*: Jokes about the Pirates selling out a home game > /dev/null :-)
To Do list? (Score:3, Interesting)
Filing things by categories rather than location.
Interesting, but you can imply categories by location and thus, to me this is overhead.
Sounds familiar... (Score:3, Interesting)
Anyway, it's encouraging to see that an opensource initiative is capable of doing things now, that microsoft plans for a long long far away future
Re:Sounds familiar... (Score:4, Insightful)
BeOS, anyone...?
Cheers,
Ian
Or... (Score:3, Informative)
[conspiracy_theorist_mode="on"]
Anno
[conspiracy_theorist_mode="off"]
Se
How is this different from... (Score:2)
sPh
looks like very high quality work, but... (Score:4, Insightful)
It would be ideal if the computer -- the thing that is supposed to make life easier -- did the clasification. Until that happens I cannot see myself even considering such a file access method.
Re:looks like very high quality work, but... (Score:3, Interesting)
While I do think the work presented is a great idea, it seems to me that it's a lot of effort just to setup the system.
Thats pretty much the problem with meta-data based file systems. They're great for new projects, where you have a clean start and can actually add metadata to the files. The real problem is legacy data.
My home directory weighs in at just under five gigabytes, and has files dating back over ten years, and thats just the "personal stuff". My work partition has about eight gigabytes, which is mainly source code.
I'm really not going to be able to associate metadata with every individual file by hand. Until automatic tools come along that will data mine the file content and automaticlly do some minimal level of association.
On top of this a whole new generation of development tools needs to be written. At a very basic level you need a version of make that will build all C source files on the disk with associated meta data "Belonging to Project X, dated no later than last week".
When you think about it you'll realise that while as a concept its fairly powerful, we won't be switching to using this sort of thing soon. For the same reasons the semantic web [w3.org] and RDF are having problems getting adopted, metadata based file systems face real problems before people will start widly adpoting them...
Al.Look at the save dialog (Score:3, Insightful)
Commercial Innovation (Score:2)
Didn't BeOS have this years ago (Score:5, Informative)
"This is a testament to the power of free software: this sort of innovation could never happen if it weren't for the free software nature of the underlying systems."
... or not. As I recall, BeOS had a fully functional database driven file systerm although it did not entirely through out the hierarchical side of things either (probably a good decision in my opinion). In fact, I recall reading a while back that future versions of Windows were supposed to have database driven file systems as well.
While free software is great, let's not get too cocky about what kind of innovations it can produce when we aren't aware of what the traditional software companies have already done.
Re:Didn't BeOS have this years ago (Score:3, Insightful)
But this is the first time I've seen it implemented in userland.
Re: submitter's cockiness about innovation, I think it's simple a pumped up way of saying "if I hadn't have had the source, I couldn't have done this hack". No shit.
Maybe it's just me, but I think it would have been truly more clever if it had been implemented using a stacked filesystem, or even a hacked open(2).
Re:Didn't BeOS have this years ago (Score:2)
Give BeOS a try sometime.
Historical Q (Score:5, Insightful)
The biggest problem with folders is no one wants to be a file clerk and weed, sort, and file their docs. The act of socking away a doc should as mindless as possible, not because (all) users are mindless but because they have better things to do, and shouldn't spend a minute adding keywords to every doc they might never see again.
You know how it is -- you're searching and coming up with junk, and want to yell at the computer, do what I meant, not what I said! This would be one of my first pics for AI on a personal computer.
I agree folders doesn't cut it, though as a metaphor for explaining the tree it's not bad. The problem is the tree.
Re:Historical Q (Score:2)
becaue folders and trees go together like... leaves and filing cabinets!
Re:Historical Q (Score:3, Interesting)
The file selection widget (FSW) is a core element of any high-level toolkit, and yet I've never seen one that provided any kind of utility that I need to make a filesystem work well in a GUI.
For starters, all FSWs should have memory, and they should understand what they're being used for. All of my graphics apps should "remember" where the last graphics app saved a file and default to that directory. Same goes for opening a file. Or office apps.
They should also have a history pull-down.
We also need a graphical abstraction for the filesystem (other than the MS-like horizontal tree) that customizes itself through use. If, for example, there are three directories that I load and save files to/from all the time, they should be the most obvious and accessible things in the tree.
Do these things, and graphical interaction with a filesystem makes sense.
As for a metadata filesystem, I think there's utility in it to some extent, but unless "rm" understands it, and it's easy to use from that level too, it's useless to anyone who really USES a UNIX(-like) system.
Re:Historical Q (Score:3, Informative)
Re:Historical Q (Score:3, Informative)
Anyway, we did a lot of other cool stuff at Xerox in the 80s. There were two other information management systems that used non-hierarchical organizations. The Analyst (implemented in Smalltalk-80) and NoteCards (in Lisp) both had lattice file systems. You could create arbitrary links from one item to another, with lots of different kinds of links each with its own semantic meaning. It was an amazingly powerful way to navigate your files.
Why go to all that trouble? Because we found that it didn't matter how carefully people filed stuff away, they always were losing things. So the important thing was to make it as easy as possible for people to find their files, either by browsing or searching. In The Analyst, a document could be linked to by multiple folders, keywords, or other documents. The browser and search tools took advantage of the richness of linkages to make finding things easy. You just had to remember a few things about the item to locate it, rather than having to recall
This system would demand a lot of discipline... (Score:4, Insightful)
Re:This system would demand a lot of discipline... (Score:5, Insightful)
IT staffer: "That's the 3rd quarter financial report? You should click 'Financial', 'Quarterly', 'Company-wide', and 'Public'."
Secretary: "I already named it T42f.doc. Get it? 'T' for third. '4' for quarter. '2' for 2002. 'f' for financial - 'F' is for filing'."
IT staffer: "But noone but you can find it!"
Secretary, with a wink: "Hmmm... I never thought about that."
I'm really not joking. If you can't get people to use filenames like "Prelimary quote to Foo, Inc. for widget sales 2002-12-23.doc", why are they going to bother picking those attributes from a menu?
How about this: Give the users a palette of choices (with the ability to add more as required), and generate the filename based on their choices. Don't even give them the option of whipping up their own personal hash table - make them let the program come up with reasonable names for everything. You could even set a threshold, such as "At least one attribute from each category must be checked", or "every file must have at least 4 attributes".
Proprietary file formats (Score:2)
Mom and Dad file system (Score:2, Insightful)
BeOS already did this... (Score:2, Insightful)
i used it and it works like a charm.
of course hierarchical file systems are easy to use, you can name folders after categories, and they are easy to backup.
Interfacer.
all your HFS are belong to us.
Whooooah!! (Score:3, Funny)
But thankfully, it's an article about file systems.
Prior art? (Score:3)
If memory serves me correctly, the BeOS team was originally trying to do a pure database filesystem (no hierarchy), but found (in the early '90s) that the performance hit was too heavy on the hardware of the time.
Thinking of new metaphors (Score:2, Insightful)
But computers are becoming ubiquitous, pervasive. Perhaps a new metaphor could be found. An example could be objects in rooms. Think of different folders as different rooms - all files (or rather, all streams) are objects in those rooms. Navigation between rooms is possible through doors.
Of course, as others have pointed out, the HFS ain't broken, so why fix it? (Answer: why not? PC cases aren't broken, but we still have case-modders, don't we?)
Data volume (Score:2, Insightful)
I write a lot of documents and my filing system becomes ever more difficult to manage, without the skills that a librarian or filing secretary has I find that my documents become harder to locate over time. To me this is a potential solution to that problem, I do however appreciate that "Joe Bloggs" will not understand what it is about, but as far as I am concerned "Joe Bloggs" should not be using computers in the first place. Pandering to his ilk has set computing back 10 years.
The potential pitfall of this system could be where many documents have been written about the same subject i.e. testresult001.txt to testresult999.txt. The user would know with the traditional system that he wants testresult823.txt but with the new system would be presented with 1000 choices. I am possibly being myopic here!
Perhaps it is time for a new paradigm and I for one will be looking at this method with great interest.
Sounds like... (Score:2)
If so, and based on the bad Sharepoint implementations I have seen, this seems unlikely to be world-shaking. How about a follow-up article on this in a year?
Amazing. (Score:3, Interesting)
Some uses I imagine
- Create music playlists on the fly (MoodLogic doesn't count)
- Categorize work files (Across the whole partition, find images that serves as bumps, HDRI
- Install Windows and service packs first, mark files as "windows native". Then install apps. Some OS glitch, you need to reinstall ? Backup all files with directory structure which don't have "windows native" tag alongwith c:\program files and registry. Reinstall windows, restore the backed up files. Voila, no app installations required.
Re:Amazing. (Score:2)
Re:Amazing. (Score:2)
Sorry, last clarification.
You can do this for Windows now as well.
Simply boot into Linux and tag. Then while reinstalling win, do the above from Linux.
Plz don't forget E-Mail and Web documents (Score:4, Insightful)
a) Web browsing
it should now the sites you've visited, know your bookmarks and allow you to open everything you have found with a simple click.
b) E-Mail.
When it finds an E-Mail a simple double-click should be enough to open it in your mail, show you the thread it belongs to, etc.
I guess, that I'm not the only one, who has more important things in mails than in
Bye egghat.
Reiser4 anyone? (Score:2, Interesting)
Classic Example (Score:2, Interesting)
These abstraction layers have been used before on OSes such as MAC OS and OS/2. The problems always came into play when you pass the files around. There is always a step that strips the extended information. The key is wide acceptance and establish a standard for the data storage. Be sure there is a way to pass the extended data in a text format (i.e. XML) when you want to store the files on a non-supported system (or so command line tools can be easily modified to update the db).
The idea is good and I am sure it will be very useful to a lot of people. Good Luck.
Just what we need. (Score:2)
Commercial Innovations (Score:2)
Surely this is an overstatement. I think what you mean is that a guy off the street couldn't add this file navigation scheme to an existing commercial OS, not that the commercial developers themselves couldn't do it. Or are we now suggesting that the open software movement is the sole owner of the term "innovation"?
This should be implemented at the FS level (Score:4, Insightful)
Right.
This is astoundingly bad software engineering.
Manuel, when your software fails, and it will, and somehow that db file gets trashed you've rendered that users' files as a huge heap of unsorted data. Effectively it would be 100 times worse than never implementing your system than 10 times better. No matter how bulletproof you think your code is, it probably isn't 100% perfect so having all your eggs in one basket is unwise to say the least.
Even if your code is 100% perfect this is a mistake. What happens when a sector goes bad and this file is trashed? What happens when the first really dangerous linux worm makes it a point to delete *.db from the filesystem?
Give the files names that are coded with human readable attribs! Double up that db file! Jesus, man... build SOME kind of redundancy in your system before you throw away the old way of storing the data.
There's a reason why there is such a scramble to implement a general attribute system at the FS level on many FS projects right now(*). The time has come for OSS to start being smart about this, but cramming all your metadata into a single file and throwing the backup out the window is just a very, very poor idea.
(*) BeOS was, yet again, way ahead of it's time with BeFS.
Doc Management (Score:3, Interesting)
Metadata? And so has it Sharepoint (Score:3, Informative)
But it's MS and here I am burning karma for just mentioning it. Big deal, I can spare the karma
More radical please (Score:3, Interesting)
The system I have been dreaming of for a while would be far more graphical (had a quick look at thebrain.com, it's still text with a few lines as far as I can see).
My dream system would enable you to specify file attributes such as size, path(s), name, type etc, as well as regex greps on the content, and then plot the filing system in 3D space, through which you could move with a joystick. You would be able to assign attributes to graphical features, eg make scripts cuboid, text files spherical, bigger files bigger on a logarithmic scale and so on. Related files would appear like solar systems, and by changing the importance of the file attributes you could change the way the files grouped.
Probably not what you'd want to use every day, but I'm sure I'd find a few mislaid files with such a system.
Re:More radical please (Score:3, Funny)
If I can't text process it, then I don't want it (Score:2, Insightful)
I like datbases, but for somethings they should not be used!
When it comes to the OS, I want to be able to text process data EASILY...with BASH! This road leads to things like binary configuration files and that leads to things like the Microsoft registry which I detest.
Databasizing everything (including the filesystem) IS NOT THE ANSWER
Re:If I can't text process it, then I don't want i (Score:4, Interesting)
I've found myself many times wishing I could just type "select location,filename from datastore where contents like %resume%"
SQL comes much more naturally to me than the find command does. I would love an easier way to index the contents of everyfile on my system by an arbitrary number of metadata and then have that accessable via a simple sql statement.
I remember Scott Hacker did something similar with BeFS and his webserver at somepoint but he's long gone as is BeOS.
Am I the only one that this makes sense to?
The HFS *is* a database (Score:3, Insightful)
Other query operations are supported such as wildcard characters and, in large OSes other than Unix, a variety of other attribute queries (a la "/usr/bin/find" but accessible from "ls").
Now the file table itself is a database, which can be readily implemented using a relational database. Microsoft NT an other OSes have had such support for quite a while now.
I'm glad to see the full relational database FS model starting to hit the mainstream. By this time researchers are looking into XML based File Systems (store metadata in XML-like syntax, support any XML query on the files).
Which brings us back to an often overlooked fact. Linux has, in general, not been at the leading edge of OS research (with the possible exception of the beowulf architecture). This is alright as for many years the goal of Linux was to reimplement Unix on the intel x386 architecture. However we must keep in mind that the really advanced OS features out there have yet to make it into Linux, things such as new environment metaphors, persistent data support, and intelligent user interactions.
Windows groundwork (Score:5, Informative)
You want metadata on files? NTFS streams give you a place to store metadata (much like Mac resource forks but with any number of named streams).
You want to search on the metadata? The Microsoft Indexing Server will build a database and let you search on it (though it's a very strange system to use - in XP go into Administrative Tools, Computer Management, Services and Applications, Indexing Service, System and click on "Query the Catalog". You can do instant searches for all kinds of stuff, look at the help.
OLE Structured Storage is like a single file version of the filesystem we're talking about - a way of saving a bunch of objects (some of which you didn't create but that are in your document) into a file. I believe Microsoft's Office apps use it (could be wrong there though).
Right-click on an MP3 file and pick Properties in XP and go to the Summary tab. There's the metadata - the stuff the index server is going to index. If you add a new file format to the system, you can supply a DLL that will be able to supply the metadata for those files - so you download an MP3, save it on your disk, and the index server uses the DLL to get the metadata and add it to the database. It works pretty well.
I don't really have a point to all this, just listing some stuff that Windows has that "should" make it easy for Microsoft to add the OO FS someday and have it instantly work with existing apps.
- Steve
Use Case Scenarios (Score:3, Interesting)
I'm your average home user, but even so I have about 100 documents I work on. However, I was smart enough to give them meaningful filenames and locations where it takes only a few seconds to find the file. Remembering attributes for each and every file would be a pain.
Case 2:
I'm a developer. I'm sorry, but I want file Y in F/O/O/BAR. I need something exact to describe where a file is at least. Anything else doesn't work.
Case 3:
I'm a mornon who doesn't give a flying-f*** about where I put my files, and I don't care what I name them. I already have documents in my C\:, C:\Windows/Temp, C:\sdf34\, and C:Documants. It takes me a couple minutes or two to find a file. What? I have to classify by keyword now? Who do you think I am? It needs to classify the files for me or I won't have any of it.
Case 4:
I'm a scientist/business man that deals with classifications on a day to day basis. I already have a database because I needed it to be efficient. If it was on the file system level, then it'd be pretty cool.
I can't think of any other positive cases where this product is useful. Thus, it's my bet that it'll be niche forever. Anybody got any other use cases that I'm obviously missing?
Re:What's wrong with hierachical systems anyway? (Score:5, Funny)
Well, they're pretty darn hard to spell, for one thing.
Re:What's wrong with hierachical systems anyway? (Score:5, Interesting)
Re:What's wrong with hierachical systems anyway? (Score:5, Interesting)
Re:What's wrong with hierarchical systems anyway? (Score:5, Funny)
Unfortunately you overlook the fourth and largest group -- those who COMPLAIN about everything and do nothing.
Re:What's wrong with hierachical systems anyway? (Score:3, Insightful)
We still have to deal with file systems on some level. What happens to your abstracted layer when you want to copy something to a disk or burn a CD? You can't perform a file copy without breaking the abstraction, so the abstraction is broken before you begin to use it.
When you insert a Drivers CD in Windows, it may auto-run, sheilding you from the (often arcane) filing of the drivers. But unless there is an agreed format for the meta-data, your computer may not understand what is on the disc.
The system he proposes also breaks down on anything that is not new and made by the user. Document storage. Do we then only abstract the Documents folder?
While document management is a good idea, it needs to be subtle. It may take a user some time to learn the system, but that is better than crippling it to ensure first-time user ease. Macs used to come with several Tutorials on how to use the mouse and interact with the OS. We will probably need tutorials of that type again, soon.
Document management needs to spend very little time taking the user away from work. It must be integrated with the file system to work adequately, or the "switching" people will have to do to move from managed to unmanaged filing will aggravate and confuse them.
Folders (Score:3, Interesting)
Re:Folders (Score:3, Interesting)
If you explain them it's just a "box with a label on it", most of them do get it. They know boxes, they know labels and they do realise you can put a box in a box in a box (Russian puppets - forgot the name).
It all comes down to how organized someone is. If you are organized, you will grasp the concept of a directory tree (my mom does, she is over 50 and didn't touch a computer until last year). If you are unoriganized, you will lose your files anyway. Consider this: you save your spreadsheet today as "Yearly Report 2002", and two days later you want to call it back your mind just doesn't say "Yearly Report 2002", but more like "Financial Data last year". Then your nice database-filesystem won't find it either. Unless there is some serious AI backing it.
Re:Folders (Score:2, Insightful)
um, he took care of this.
try reading the article next time.
(am I feeding a troll if they're marked +2 Interesting?)
Re:Folders (Score:4, Interesting)
Babushkas. If you want some, there's always Google.
Consider this: you save your spreadsheet today as "Yearly Report 2002", and two days later you want to call it back your mind just doesn't say "Yearly Report 2002", but more like "Financial Data last year". Then your nice database-filesystem won't find it either. Unless there is some serious AI backing it.
Now that would be an interesting file storage abstraction. I've played with the idea of a relational file structure, that would enable one to save meta-information on a file and later find it by information that relates to it. Implemented correctly, you could save your "Yearly Report 2001" and later find it by asking for "financial data two years ago". Something that combines newdocms and ThoughtTreasure. [signiform.com]
ThoughtTreasureTM is a relational information storage handler combined with a (semi-)intelligent AI. You can supply information like "Peter loves Paul" and "Paul hates Cahtrine." You can then ask questions like "Who does Peter like?" and "What relationship are there between Paul and Cahtrine?" If you say stuff like "Peter dislikes Paul" it complains like "But I thought Peter loved Paul." But it goes far further than that. You can have it parse a movie review, and ask about information about the movie "Who directed Pulp Fiction? Who starred it?"
Combined with a file storage solution, this would open quite interesting, new forms of computer file storage.
Re:What's wrong with hierachical systems anyway? (Score:5, Funny)
Re:What's wrong with hierachical systems anyway? (Score:5, Insightful)
1. Paths tend to get long.
2. You have to be careful of your "current path". Some apps have weird defaults and if you're not careful, you end up with your file in a strange location.
3. Some items do not fit into the hierarchical structure. Should my porn directory be organized into movies, stills and texts or perhaps perverted, spicy and nice? Whichever atrribute I choose I will have trouble searching on the other.
Of course I can always use locate or find, but these tools only look at preset attributes (filename, last access date, substrings) and the solution from the article lets you specify your own attributes.
Re:What's wrong with hierachical systems anyway? (Score:2)
Should my porn directory be organized into movies, stills and texts or perhaps perverted, spicy and nice? Whichever atrribute I choose I will have trouble searching on the other.
Errr, symlinks anyone, they fit in any number of categories. In fact it would be simple to create a file saving/organizing app that would automatically set up HFS links based on attributes that you define.
Of course I will try this new filesystem. I'll always try out anything new, but I do have some preliminary reservations. I'm happy with HFS, but I'm a coder. My linux desktop is 99% command line driven, I use ratpoison as my wm, so it may not work for me, but I certainly won't bash something that might work for others.
Use the right tool for the right job
Re:What's wrong with hierachical systems anyway? (Score:3, Insightful)
Well, in a good file system, you can make a set of directories like this: (since we're using porn as an example)
Some platforms are much better suited to doing this (unix), while others (Win) are not.
Now, having the ability to automatically generate the symbolic links would be nice.
Re:What's wrong with hierachical systems anyway? (Score:4, Interesting)
1. "Filesystem? I don't need no stinkin filesystem!" An ideal Palm-esque computing environment wouldn't have any filesystem. There simply isn't any reason for it. Why would you store addresses in an address file or a book report in a word file? Saving/Opening files should be transparent to the end user. Versioning should be built in, yet simple to understand. Forking files can be accomplished without copying a file. This is intuitively the simplist idea.
2. If you somehow *have* to think in terms of files, then your conclusion may be to use files. However, I don't see why anybody would come up with a hierachical file system, unless they were accomidating for hardware limitations. Placing files somewhere within a huge directory tree is just too darn complicated. Why should the same file not exist in multiple directories? Why should copies of a file exist? Everything, including advanced security policies (more advanced than what is currently possible) is available for a *keyword* driven filesystem.
I believe this is a step in the right direction and I can't wait until my favorite OS (not Linux) adopts a similar feature.
Re:What's wrong with hierachical systems anyway? (Score:2)
you mean osX doesn't have this "feature" yet? even microsoft has the broken search feature...
really, files, documents, directories, folders, whatever you want to call them you're saving a unit of work for later retrieval. this concept adds extra LAYERS to that save and retrieval process (keywords, and categories). and even the categories resemble folders/directories a LOT. so in effect, this merely adds meta-tags to the document header and allows searching on meta-tags across folders.
file versioning is a nice feature that i really liked in VMS, but you learn to be more carefull in other systems that don't have that built in. instead of letting the OS automatically create backups, you do it yourself when it's needed.
Re:What's wrong with hierachical systems anyway? (Score:3, Insightful)
Let's see. If I want to retrieve a document that's been filed I go to the bank of file cabinets, select the cabinet that has the drawer I want, open the drawer, scan folders, pull envelope from correct one, extract document.
Cabinet/drawer/folder/envelope/document
Maybe it's because there really is an analog in meatspace for the heirarchical file system.
Intuitive (Score:4, Interesting)
1. (a) "We don't need no stinking filesystem." The ideal palmesque OS would have the same idea just demonstrated differently. You aren't going to open up your notepad to see an address. The address file is in the address program (directory). The schedule file is in the calendar program(directory). The programs you use to open the files become your folders.
1. (b) "Saving/Opening files should be transparent" The only people that would think like this in the real world have been living with someone that picks up after them all the time. When you are working on some (paper and pencil) project, and just stand up and walk away, do you exepect it to be available at the office tomorrow? When you start working on several projects in succession on your desk, and have reams of loose paper, can you easily bore your way back down. No, reasonable, organized people pick up the porject they are working on, file it away in the file cabinet/brief case/wherever it is supposed to go. There are logical beginnings and endings to your working on a project that only you can decide on. A spreadsheet, for example, do you want it to save every time you make a change... No, by their design, you would normally set up all your formulas, save that, and then every day/month/year open up the spreadsheet, plug the numbers, get the results, and save the specific results to a different file, or just look at the values produced. Not to mention, when you sit down at your desk in the morning, do you expect your desktop to know what project you want to work on? No, and you don't expect your computer to know what project you are working on either. Opening/Saving files shouldn't be and can't be transparent to the user.
I used to use a lot of floppies when growing up. I appropriated a lot of disks from other places. I used the "grab the black disk with the couple of remnant label pieces... no the other black disk... No, the one with the two small pieces of adhesive... Ooops, the one with the three pieces..." Now, I have to search all the disks everytime I want anything off of them, because I never labeled them. Saving things in well defined locations, for well defined tasks is reasonable, intuitive, and necesary task to saddle a user of any system/technology/information with.
2. I don't really need to address this point specifically, since the answer is inherent in the points above. The overly large filesystems are part of a whole system that the user doesn't really need to know about. That is why the "Desktop/..." paradigm of Windows came about, and is so useful. People working on your word processor have a reason to put the font files in one directory, the plugins in another, and the preferences in a third. The user couldn't care less. If you start the user in a directory tree just for them, then they won't be stuck in a huge file system, and can still work in a fashion that has made sense for litteraly thousands of years.
The filesystem paradigm has been around for a long time, again litterally thousands of years, because it works, it is easy, and it is how people think.
G:\Netowkrfilesystem\
Accounting\AccountsRecie
Re:What's wrong with hierachical systems anyway? (Score:3, Interesting)
I've been thinking along these lines for a couple years now. Suppose a computing appliance, perhaps handheld, or not, didn't have a filesystem. How would you make use of the hard disk?
Suppose the software saves everything in memory resident database. No filesystem, and no disk. Everything stays in memory. But it is virtual memory. Every page in memory has a reserved backing store page on the disk. The disk partition for this OS is just a big swap area. The total size of your usable "memory" is the swap area, not RAM. Now powering off the device becomes very fast. And so does powering on. No more "booting up" nonsense. You press the "off" button, and almost instantly the device is off. No matter how much data you have, or if you were in the middle of a huge unsaved word processing document, the device instantly powers off and back on again. No artificial concept of "saving" a file -- just like PalmOS. You don't "save" anything. In fact, no artificial concept of computer files. (For flamers: I'm not outlining a fully fleshed out implemention here, just some rough ideas, think different.)
You can still move your stuff to other computers via. "syncing" or whatever you want to call it. It's just that higher level concepts are copied, uploaded, downloaded, e-mailed, etc. rather than a file (i.e. collection of untyped, unlabeled bytes). I may move my mp3's, and they are still categorized by artist, recording, date, label, etc., etc..
I've also been thinking that a filesystem such as NTFS or ReiserFS that allows attaching huge ammounts of metadata, or small amounts of metadata to any file would be important. For instance, my 4096x2048 digital photograph of the grand canyon (big file), should still be able to have a thumbnail (say about 128 KB) attached as metadata. Since the thumbnail is part of the "directory" information of the file, merely copying the file to another location retains all the metadata. (As opposed to Windows or KDE, where the thumbnail is another little hidden file somewhere near where the original file was stored.) Heck, I might want a graphic thumbnail metadata attached to an mp3 file. Of course, I suggest ReiserFS or NTFS because there should be no limit on the number of labeled metadata attachments, nor on their size. I should be able to attach metadata "Title":"Grand Canyon", "TYPE","TIFF", or "Audio Clipping":<5 MB of audio data> just as easily. When I move the file, the metadata moves with it -- but the metadata is not seen in the primary information flow -- i.e. sequence of bytes -- that make up the "file" data.
As much as I hate Microsoft, I expect that it is they who will do stuff like this first. Ideas such as I am discussing here will encounter lots of resistance from the old school. Just look at the resistance to the topic of this article in this discussion. (I remember when we had to had to organize and save our files ourselves, and we used stupid extensions like ".jpeg" as the only metadata, and it was uphill both ways.)
Drifting to a different topic, I wonder if true innovations at higher levels come from us geeks? We put up with the most abysmal user interfaces for so long that we are not even capable of recognizing a bad user interface. We are comfortable with what we've got. I frequently see the attitude: if I can learn this stuff, then you can too. If you can't get under the hood of your 1920's car and fix it when it frequently has minor troubles, then you shouldn't be driving. Where I'm going with this is that it may take talented people who are being paid to build next generation interfaces who follow someone else's vision who is not constrained by the present.
Just some opinions. I should quit rambling now.
Re:What's wrong with hierachical systems anyway? (Score:3, Insightful)
Well, yes. On a computer, I would expect it to available tomorrow *exactly* the way i left it. The only reason that I don't expect this in the real world is because it's not a feasible possibility. If it were, then I would expect it to be as I left it in the real world, too.
You commented: "My biggest concern with this new system is that if you fail to generate good keywords (I suspect this will be a big problem) it is going to be hard to browse through a likely directory to find the file."
Although I will admit that current searching technologies are not very good at determining what I actually want (e.g. misspellings, synonyms), I will say that I don't think that choosing keywords would be a problem. I believe that choosing which directory to place the file in is a more complicated problem, because you can only pick one place (without worrying about shortcuts or links). Many of the searchable keywords would be generated from the document itself: last-edited-today, various project keywords, application-based (e.g. excel-spreadsheet, letter), keywords based upon the content. Ultimately, I believe this system would be *more* tolerant of poor organization, rather than less as you state. I believe that people would adapt to it and learn to use good keywords easier than they did for hierachical filesystems. I will admit, however, that it is a flaw *whenever* people have to adapt to something, and most have alread adapted to the idea of a hierachical filesystem.
You also mention that the PalmOS filesystem implements such a filesystem poorly, but please don't crush the idea based upon one implementation. I see NO reason that application developers would have to worry about implementing a keyword filesystem any more than they would hierachical filesystem. It sounds like Palm's version isn't mature enough to be useful.
The industry is working to remove the hierachical filesystem. It's only a matter of time. Look at WindowsXP Tablet edition's note-taking program. You basically have one *file* for all of your notes... ever. You can subdivide and categorize these notes, but it's all one file.
Re:What's wrong with hierachical systems anyway? (Score:3, Interesting)
Interestingly enough... (Score:5, Insightful)
The HFS filesystem (not generic hierarchical filesystems, but the Mac filesystem) works this way; each file has a unique ID, and everything -including name and even path- is just a piece of metadata. This is how they make "aliases" work; they're similar to softlinks but they don't break when you move the original, because they reference the file by ID rather than by path.
In essence, they overlay a familiar hierarchical system on a more database-like structure. The concept has been scoffed at by other filesystem developers for ages, though, and unfortunately there's no really good interface to HFS' metadata systems, or something could be written to do this same thing on the filesystem as it exists now.
Re: OS/2 had this as well. (Score:3, Interesting)
The coolest part was the extent of metadata that could be added as extended attributes... probably close to a solid kilobyte. Even better was how to maintain compatibility with FAT volumes, the extended attributes were stored in a file in the root directory. Of course, if that file was lost you had a serious problem.
Ahh, the memories.
Re:Interestingly enough... (Score:4, Informative)
It's unfortunate that you haven't studied FS design yourself, then, before criticizing people who design them.
How do you think a hard link works?
Now on the other hand, it would be nice if there was some way to have unique inode IDs across multiple file systems, but that's a MUCH more difficult problem.
Re:What's wrong with hierachical systems anyway? (Score:2, Interesting)
Just because it works fine doesn't mean something else isn;t going to work better.
Re:What's wrong with hierachical systems anyway? (Score:2)
Documents/projects/Zambezi. Group files by substance, not by type. To give a coding example, ~/src/c/helloworld.c is bad whereas ~/src/helloworld/c/helloworld.c is better.
Of course, with symbolic link/shortcuts/aliases/ there's no reason why it can't appear in both.
Cheers,
Ian
Re:What's wrong with hierachical systems anyway? (Score:2)
Where should the Order to Bob's Builders for the first phase of Roofing work on the Castle Vale project go?
I can guarentee that in a real work situation different people will have different thoughts as to the best way to store the documents and very good reasons within the paradigms they work by for sorting/grouping them that way. To take the example in your comment whilst it might be sensible for you to group data by project. But, for someone working in accounts it would be sensible to group them by type as they want to see all the accounts files together; in the same way that the order processing department wants to see all the orders together, accounts recievable want to see all the invoices that have been issued together and a programmer working on multiple projects wants to see all the specification documents for the projects they're assigned to together.
To be efficient different people need to view the same sets of documents (or subsets) in different ways. A good DMS will let them do that without having to manually maintain links/shortcuts. Ideally it will also link into things like the email system, LDAP and applications/databases to provide a single consistent view of all the relevent data that a person needs to do their job, manage security opn documents and provide versioning, change control/authorisation, auditing and flexible, usuable, metadata.
Stephen
Re:What's wrong with hierachical systems anyway? (Score:2)
Fair enough, I did oversimplify by using Hello World.
A better example would BigProject, a multilanguage affair. You would then have:
~/src/BigProject/c/BigProject.c
~/src/BigProject/c/BigProjectFeature.c ~/src/BigProject/perl/launchBigProject.pl
That shows the reason - it is clear that all the files are to do with BigProject, and underneath that it is also clear what type of files BigProject requires.
symlinks are more trouble than they're worth for files that have the possibility of moving
Yes, under straight Unix that's true. MacOS's aliases were a revalation however - you could move the target file even to a removable volume and the alias would still understand where to point. I haven't used OS X so I don't know if that feature has been retained. Windows shortcuts follow the Unix model, I know.
Cheers,
Ian
Re:What's wrong with hierachical systems anyway? (Score:3, Informative)
Depends what your using them for. It is quite often useful to be able to group documents by different attributes, to have the same document appear in multiple places or to attach searchable metadata to documents.
For example if you work on multiple projects:
Hierarchical filesystems are fine for storing documents, they're often not the best mechanism for retrieving them. Also they're not a good fit for the way a lot of non-technical people think about their documents. A good DMS will map metadata and RDBMS style access patterns over a hierarchical storage system.
Stephen
Re:What's wrong with hierachical systems anyway? (Score:3, Informative)
For example, where I work, the files created often apply to multiple subjects or disciplines, and sometimes span departments. As a result, trying to create directories for each discipline or subject or department doesn't always work because some information legitimately belongs in more than one of them.
The only options that I can think of are to put copies of those files in each appropriate directory (which would be a Very Bad Thing, IMHO), or to try to create and maintain links to the files, which is not feasible given that there are hundreds of users and tens of thousands of files.
In cases like this, I've often wished there were a way to assign searchable properties/attributes to files, which would make it easier to browse through them aside from just by their location.
That's not to say that we would do away with the hierarchical structure, or that having searchable "attributes" would solve all of the problems; but, it would be an enhancement to the current system.
So, personally, I'm very interested to hear about this development.
Re:Well now, hold your horses (Score:3, Interesting)
Read the article first.... (Score:2)
Re:SQL does not cut it (Score:4, Insightful)
That was done pre-UNIX with PICK. The whole O/S was a database.
Microsoft has been working on an Object File System for years and it is rumored that it might finaly ship in Yukon.
A database baked file system is a great idea for an O/S. But the relational model is long overdue for the garbage pail. Modern programming languages since C have used pointers or object references. If JOIN and messing arround with tables is so good why don't we all use COBOL?
One of the things that appeared in VMS a while back that was pretty cool (and pretty easy to do on a log based file system) was transactions at the file level. You could take any set of file I/O operations and wrap a transaction arround them. This meant that you could have atomic updates to any file base resource without having to suffer the pain of SQL.
It would be pretty easy to implement this on a Linux log based file system (or windows for that matter). All you do is extend the log structure so you can group operations together and implement some sort of commit flag.
You could then build an object oriented filestore database using XML flat files. OK so maybe the system is not going to be up to storing millions of records without more infrsastructure. However most programming tasks use configuration files that are unlikely to be more than a few tens of Kb and are routinely managed as in memory structures anyway.
Re:This sorta rings a bell.... (Score:2)
"On the Mac, I was used to double-clicking the hard drive, then navigating through the directories to the application I wanted to use, and launching it from there"
plus, it's not like they took that option away...no one has stopped you from double-clicking on "My Computer" and navigating into any of the hard drives...